Effect of data resampling on feature importance in imbalanced blockchain data: comparison studies of resampling techniques

نویسندگان

چکیده

Cryptocurrency blockchain data encounter a class-imbalance problem due to only few known labels of illicit or fraudulent activities in the network. For this purpose, we seek compare various resampling methods applied two highly imbalanced datasets derived from Bitcoin and Ethereum after further dimensionality reductions, which is different previous studies on these datasets. Firstly, study performance classical supervised learning classify transactions accounts datasets, respectively. Consequently, apply techniques using best performing algorithm each Subsequently, feature importance given models, wherein resampled directly influenced explainability model. Our main finding that undersampling edited nearest-neighbour technique has attained an accuracy more than 99% by removing noisy points whole dataset. Moreover, best-performing algorithms have shown superior reduction comparison their original studies. The matchless contribution lies discussing effect interconnected with explainable artificial intelligence (XAI) techniques.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

the clustering and classification data mining techniques in insurance fraud detection:the case of iranian car insurance

با توجه به گسترش روز افزون تقلب در حوزه بیمه به خصوص در بخش بیمه اتومبیل و تبعات منفی آن برای شرکت های بیمه، به کارگیری روش های مناسب و کارآمد به منظور شناسایی و کشف تقلب در این حوزه امری ضروری است. درک الگوی موجود در داده های مربوط به مطالبات گزارش شده گذشته می تواند در کشف واقعی یا غیرواقعی بودن ادعای خسارت، مفید باشد. یکی از متداول ترین و پرکاربردترین راه های کشف الگوی داده ها استفاده از ر...

Using Unsupervised Learning to Guide Resampling in Imbalanced Data Sets

The class imbalance problem causes a classier to overt the data belonging to the class with the greatest number of training examples. The purpose of this paper is to argue that methods that equalize class membership are not as e ective as possible when applied blindly and that improvements can be obtained by adjusting for the within-class imbalance. A guided resampling technique is proposed and...

متن کامل

A Multiple Resampling Method for Learning from Imbalanced Data Sets

Re-Sampling methods are commonly used for dealing with the class-imbalance problem. Their advantage over other methods is that they are external and thus, easily transportable. Although such approaches can be very simple to implement, tuning them most effectively is not an easy task. In particular, it is unclear whether oversampling is more effective than undersampling and which oversampling or...

متن کامل

Importance Resampling for BSSRDF

Rendering translucent effects using the diffusion model BSSRDF [Jensen et al. 2001] is still a difficult problem compared to traditional BSDF, because it introduces additional degrees of freedom. In a BSSRDF, the light does not scatter at a single point: it enters the surface at one point, and exits at another. Various techniques have been proposed to sample an exit point based on a entry point...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Data science and management

سال: 2022

ISSN: ['2666-7649']

DOI: https://doi.org/10.1016/j.dsm.2022.04.003